[AURON #2366] fix: Handle Paimon metadata columns in V2 native scan by lyne7-sc · Pull Request #2367 · apache/auron

lyne7-sc · 2026-06-26T09:44:48Z

Which issue does this PR close?

Rationale for this change

Paimon metadata columns are produced by the Paimon scan layer rather than stored as physical columns in data files. The Paimon V2 native scan was passing these columns to the native Parquet/ORC reader as file columns, which can return incorrect values.

For example:

create table paimon.db.t_metadata (id int, v string) using paimon;
insert into paimon.db.t_metadata values (1, 'a');
select id, __paimon_file_path from paimon.db.t_metadata;

The native path returned null for __paimon_file_path, while Spark/Paimon's scan path returns the actual file path.

What changes are included in this PR?

Recognize Paimon metadata columns using PaimonMetadataColumn.
Materialize supported file-level metadata columns (__paimon_file_path, __paimon_bucket) as per-file constants.
Keep unsupported Paimon metadata columns on Spark/Paimon's scan path instead of reading them from Parquet/ORC files.
Cover metadata columns both with and without table partition columns.

Are there any user-facing changes?

No API changes. This is a correctness fix for Paimon V2 native scan.

How was this patch tested?

Adds Paimon V2 integration tests

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

SteNicholas

@lyne7-sc, thanks for the fix! The overall approach is sound: materialize __paimon_file_path/__paimon_bucket as per-file constants via partitionSchema, and fall back to Spark for unsupported metadata columns. The functional Test Paimon 1.2 CI job (which runs the new integration tests) is green.

lyne7-sc · 2026-06-28T14:52:21Z

@SteNicholas Thanks for the careful review! Addressed the comments in the latest update, and the relevant ci is green now.

SteNicholas · 2026-06-29T05:55:19Z

@lyne7-sc, could you provide your wechat which I could discuss with you?

SteNicholas

@lyne7-sc, thanks for updates. The approach — reusing the partition-constant mechanism rather than reading these from Parquet/ORC — is clean. I found two correctness issues (verified against the decompiled Paimon 1.2.0 sources) plus a couple of value-fidelity edge cases and test-coverage gaps; inline comments below. Recommend addressing the two confirmed bugs (__paimon_file_path encoding, and the partition-key/metadata name collision) before merge.

Minor (not blocking): in toPartitionValueTemplate, SQLConf.get.resolver and indexByName = partitionKeys.zipWithIndex.toMap are split-invariant but rebuilt per split; and partitionKeys() is fetched twice (a Set at L131 and a Seq at L175). Worth hoisting into computePlan. Also consider whether file_path could be materialized on the executor (as NativeIcebergTableScanExec.metadataPartitionValues does) instead of baked into a per-file InternalRow on the driver.

SteNicholas

LGTM. The native Paimon V2 scan correctly handles __paimon_file_path and __paimon_bucket, falls back for unsupported metadata columns, and properly distinguishes physical/partition columns that collide with metadata names. The test coverage (multi-file splits, non-zero buckets, name collisions, partitioned tables, special-character partition values) is thorough.

One thing I verified closely: the __paimon_file_path value is built with new Path(rawFilePath).toUri.toString. This matches Paimon 1.2.0, whose PaimonRecordReaderIterator materializes the column via filePath().toUri().toString() (the percent-escaped form) — confirmed against the paimon-spark-3.x:1.2.0 artifact. So results agree with vanilla Paimon, including the '50%' partition case. (Note for the future: newer Paimon switched this to Path.toString()/unescaped, so if Auron ever bumps the Paimon dependency this rendering will need to follow.)

A few optional, non-blocking nits:

PaimonScanSupport.scala — in toPartitionValueTemplate, the else if (isFilePathMetadataColumn(field.name)) null branch returns the same null as the trailing else (dead) and, unlike isPartitionValueField / filePathMetadataIndex, omits the !isPhysicalColumn guard. Safe today only because a physical column reaches partitionSchema solely as a real partition key; consider dropping the branch or adding the guard for consistency.
metadataFilePath = new Path(...).toUri.toString is computed for every data file even when no __paimon_file_path column is projected (filePathMetadataIndex < 0) and then discarded — could be guarded behind filePathMetadataIndex >= 0.
containsName / isFilePathMetadataColumn / isBucketMetadataColumn re-fetch SQLConf.get.resolver per call inside the per-field/per-file loop, though computePlan already binds resolver.
In isPaimonMetadataColumn, the containsName(PaimonMetadataColumns, name) clause is subsumed by the startsWith("__paimon_") prefix check, so the PaimonMetadataColumns set is redundant.
dataFile.externalPath().orElse(s"...") eagerly builds the fallback string even when externalPath() is present; orElseGet(...) would defer it.

None of these block the change.

lyne7-sc added 2 commits June 26, 2026 17:23

test: add paimon metadata columns suite

3995864

support paimon file-level metadata

b05f5a6

github-actions Bot added the thirdparty-paimon label Jun 26, 2026

SteNicholas requested a review from Copilot June 28, 2026 06:27

Copilot AI reviewed Jun 28, 2026

SteNicholas reviewed Jun 28, 2026

View reviewed changes

SteNicholas self-assigned this Jun 28, 2026

apply suggestions

ea31cda

SteNicholas reviewed Jun 29, 2026

View reviewed changes

lyne7-sc added 2 commits June 29, 2026 22:28

test: add tests for paimon v2 scan exec

fcb0914

fix paimon metadata column handling

7864167

apache deleted a comment from lyne7-sc Jun 30, 2026

SteNicholas approved these changes Jun 30, 2026

View reviewed changes

SteNicholas merged commit 8145cc9 into apache:master Jun 30, 2026
123 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AURON #2366] fix: Handle Paimon metadata columns in V2 native scan#2367

[AURON #2366] fix: Handle Paimon metadata columns in V2 native scan#2367
SteNicholas merged 5 commits into
apache:masterfrom
lyne7-sc:fix/paimon_meta

lyne7-sc commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

SteNicholas left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lyne7-sc commented Jun 28, 2026

Uh oh!

SteNicholas commented Jun 29, 2026

Uh oh!

SteNicholas left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SteNicholas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lyne7-sc commented Jun 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

SteNicholas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lyne7-sc commented Jun 28, 2026

Uh oh!

SteNicholas commented Jun 29, 2026

Uh oh!

SteNicholas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SteNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SteNicholas left a comment •

edited

Loading

SteNicholas left a comment •

edited

Loading